Skip to content

Audio: MFCC: More updates and topologies to run MFCC for Mel spectrogram audio features in SDW PCs#10750

Open
singalsu wants to merge 11 commits into
thesofproject:mainfrom
singalsu:mfcc_use_32bit_fft_mel
Open

Audio: MFCC: More updates and topologies to run MFCC for Mel spectrogram audio features in SDW PCs#10750
singalsu wants to merge 11 commits into
thesofproject:mainfrom
singalsu:mfcc_use_32bit_fft_mel

Conversation

@singalsu
Copy link
Copy Markdown
Collaborator

@singalsu singalsu commented May 7, 2026

This PR contains more updates for MFCC.

  • The Mel audio features accuracy is improved with 32 bit Q9.23 format. The previous 16 bit Q9.7 had very little signal bits for log10 Mel format with normalization. The values were quite small in -1 to +1 range.
  • Fixes for many issues
  • Add of topology for SDW devices to run MFCC in a branched microphone capture pipeline. An example of topology is shown below with MFCC for both headset microphone and notebook device microphone.
sof-arl-cs42l43-l0-cs35l56-l23-mfcc

@singalsu singalsu force-pushed the mfcc_use_32bit_fft_mel branch from 40bb97f to 1768663 Compare May 12, 2026 11:44
@singalsu singalsu changed the title Audio: MFCC: Use 32 bit FFT and Mel frequency scale filters for better precision Audio: MFCC: More updates and topologies to run MFCC for Mel spectrogram audio features in SDW PCs May 12, 2026
@singalsu singalsu marked this pull request as ready for review May 12, 2026 11:55
@singalsu singalsu requested a review from ranj063 as a code owner May 12, 2026 11:55
Copilot AI review requested due to automatic review settings May 12, 2026 11:55
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the MFCC feature extraction path to improve Mel log precision (moving key Mel outputs to 32-bit Q9.23) and adds SoundWire (SDW) topology support for branched “audio features capture” pipelines (MFCC/Mel output alongside normal capture).

Changes:

  • Switch Mel filterbank 32-bit output to Q9.23 (int32) and propagate this through MFCC processing and tuning utilities.
  • Add new topology2 pipeline/class and SDW platform includes to expose MFCC/Mel “audio features capture” PCMs for jack and DMIC.
  • Refactor MFCC tune scripts (run script + MATLAB/Octave decoders) to handle multiple bit depths and Xtensa runs.

Reviewed changes

Copilot reviewed 24 out of 24 changed files in this pull request and generated 12 comments.

Show a summary per file
File Description
tools/topology/topology2/platform/intel/sdw-jack-generic.conf Inserts a module-copier stage in the jack capture path to act as a branch point for audio-feature capture.
tools/topology/topology2/platform/intel/sdw-jack-audio-feature.conf New SDW jack MFCC/Mel capture PCM and routes into the new SRC→MFCC pipeline.
tools/topology/topology2/platform/intel/sdw-dmic-audio-feature.conf New SDW DMIC MFCC/Mel capture PCM and routes into the new SRC→MFCC pipeline.
tools/topology/topology2/include/pipelines/cavs/host-gateway-src-mfcc-capture.conf New reusable pipeline class intended to perform SRC then MFCC then host capture.
tools/topology/topology2/include/common/common_definitions.conf Adds feature flags to gate SDW jack/DMIC audio-feature capture includes.
tools/topology/topology2/development/tplg-targets.cmake Adds new SDW topology build targets enabling MFCC audio-feature capture.
tools/topology/topology2/cavs-sdw.conf Includes the new pipeline class and gates inclusion of new SDW audio-feature capture platform snippets.
test/cmocka/src/math/auditory/auditory.c Updates unit test to accommodate 32-bit Mel log output and compares against legacy reference after downscaling.
src/math/auditory/mel_filterbank_32.c Changes psy_apply_mel_filterbank_32() output from int16 Q9.7 to int32 Q9.23.
src/include/sof/math/fft.h Adds icomplex16 include (header dependency fix).
src/include/sof/math/auditory.h Updates psy_apply_mel_filterbank_32() signature to int32 output.
src/include/sof/audio/mfcc/mfcc_comp.h Forces MFCC to 32-bit FFT path and extends state for 32-bit Mel log storage/output pointers.
src/audio/mfcc/tune/run_mfcc.sh Refactors MFCC tuning runner into reusable functions and adds optional Xtensa testbench execution.
src/audio/mfcc/tune/README.txt Updates tuning documentation to match new output files and decode workflow.
src/audio/mfcc/tune/decode_mel.m Extends Mel decoder to support s16/s24/s32 formats and raw/wav reading.
src/audio/mfcc/tune/decode_all.m New helper to decode/plot all generated MFCC/Mel outputs in one go.
src/audio/mfcc/mfcc.c Simplifies prepare logging; removes a sink buffer size check.
src/audio/mfcc/mfcc_setup.c Adjusts setup behavior for sample rate mismatch; adds scratch allocation for 32-bit Mel log output and updates free paths.
src/audio/mfcc/mfcc_hifi4.c Removes duplicate fft-fill implementation; adjusts windowing and S24 input conversion handling.
src/audio/mfcc/mfcc_hifi3.c Removes duplicate fft-fill implementation; adjusts windowing and S24 input conversion handling.
src/audio/mfcc/mfcc_generic.c Removes duplicate fft-fill implementation.
src/audio/mfcc/mfcc_common.c Implements shared fft-fill routine; updates Mel processing to use/maintain Q9.23 and updates s24/s32 Mel-only output behavior.
src/audio/mfcc/Kconfig Switches MFCC to select 32-bit Mel filterbank support.
scripts/rebuild-testbench.sh Exports XTENSA_PATH in generated Xtensa environment setup script.
Comments suppressed due to low confidence (1)

src/audio/mfcc/Kconfig:13

  • MFCC is now hard-coded to use 32-bit FFT (MFCC_FFT_BITS=32), but this Kconfig only selects MATH_FFT (which defaults to 16-bit FFT support) and does not select MATH_32BIT_FFT. This can lead to link/build failures when fft_execute_32() isn’t compiled. Select MATH_32BIT_FFT here (or make MFCC_FFT_BITS configurable and select the matching FFT width).
	select CORDIC_FIXED
	select MATH_32BIT_MEL_FILTERBANK
	select MATH_AUDITORY
	select MATH_DCT
	select MATH_DECIBELS
	select MATH_FFT
	select MATH_MATRIX
	select MATH_WINDOW

{
source src.$index.1
sink mfcc.$index.1
}
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a common convention in pipeline classes to leave last widget (copier) unconnected. The upper level topology can then add widgets to pipeline if need. Also the copier index seems to be the PCM ID.

Comment thread tools/topology/topology2/platform/intel/sdw-jack-audio-feature.conf Outdated
Comment thread tools/topology/topology2/platform/intel/sdw-jack-audio-feature.conf
Comment thread src/include/sof/math/auditory.h
Comment thread src/audio/mfcc/mfcc_setup.c Outdated
Comment thread src/audio/mfcc/tune/decode_mel.m Outdated
Comment thread src/audio/mfcc/tune/decode_mel.m
Comment thread src/audio/mfcc/tune/decode_mel.m
singalsu added 11 commits May 12, 2026 15:22
Change the Mel filterbank 32-bit variant psy_apply_mel_filterbank_32()
output from int16_t Q9.7 (was wrongly commented as Q8.7) to int32_t
Q9.23 format for improved signal resolution.

The output parameter type is changed from int16_t* to int32_t* in both
the implementation and the header declaration.

The auditory unit test is updated to allocate int32_t output and
convert Q9.23 to Q9.7 for comparison against existing reference
vectors.

Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
The input samples must be shifted logically to sign bit and then
shifted right arithmetically into place for the 16 bit saturation
instruction to work correctly. This fixes a possible overflow with
large input.

Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
Remove the duplicate AE_MULFP32X16X2RS_H call in the 32-bit FFT path
of mfcc_apply_window(). Its result was immediately overwritten by the
AE_MULFP32X16X2RS_L call on the next line, making it dead code.

Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
This patch switches MFCC_FFT_BITS from 16 to 32 to use 32-bit FFT
mode for better precision in the MFCC processing pipeline.

In cepstral mode (num_ceps > 0), the 32-bit Q9.23 Mel output from
psy_apply_mel_filterbank_32() is converted to 16-bit Q9.7 before the
existing 16-bit DCT calculation, preserving the current DCT and
cepstral lifter behavior.

In Mel-only mode, output format depends on sink format:
- s16: Q9.7 (current format, backwards compatible)
- s24: Q9.15 (one int32_t per Mel value)
- s32: Q9.23 (full precision, one int32_t per Mel value)

The mel_log_32 scratch buffer is placed after power_spectra in the
fft_buf scratch area. A bounds check is added in mfcc_setup() to fail
if num_mel_bins exceeds the available scratch space.

The decode_mel.m Octave script is updated with s24 and s32 format
support for the changed output encoding.

Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
When MFCC_FFT_BITS is 32, the HiFi3/4 mfcc_fill_fft_buffer() used
AE_S16_0_XP to write 16-bit samples into 32-bit icomplex32 containers.
This left the upper 16 bits of .real with stale data and .imag unzeroed,
causing corrupted FFT input after the first frame when scratch buffers
are reused for power_spectra and mel_log_32.

Replace all platform-specific implementations with a single generic C
version in mfcc_common.c. The function performs only data copying with
no arithmetic, so HiFi intrinsics provide very little benefit. The new
implementation uses conditional pointer types (int16_t for 16-bit FFT,
int32_t for 32-bit FFT) with matching element stride, and relies on
the caller's bzero of fft_buf to keep imaginary parts zero.

Add missing icomplex16.h include to fft.h. The header uses struct
icomplex16 in struct fft_plan but did not include its definition.

After psy_apply_mel_filterbank_16() writes Q9.7 int16_t values to
mel_spectra->data, convert to Q9.23 in mel_log_32 so that all
downstream processing (dynamic mmax, clamping, scaling, DCT) works
correctly in 16-bit FFT mode.

Fix mel_log_32 scratch space check to use fft_buffer_size instead of
assuming sizeof(icomplex32) per element, which overestimated available
space by 2x in 16-bit mode.

Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
In 32-bit FFT mode the input data is 16-bit stored in the lower half
of a 32-bit icomplex32 container. The AE_MULFP32X16X2RS_L intrinsic
performs a Q1.31 x Q1.15 fractional multiply, so the 16-bit sample
must first be shifted left by 16 to Q1.31 format. Without this shift
the multiply treats the value as having 16 zero fractional bits,
producing near-zero windowed output and a corrupt FFT result.

Add the missing AE_SLAI32S(sample, 16) before the multiply in both
HiFi3 and HiFi4 mfcc_apply_window() 32-bit paths, matching the
generic C implementation.

Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
Add missing cleanup for fft_plan. After mod_fft_plan_new() succeeds,
failures in window setup and mel filterbank initialization jumped to
free_fft_out, leaking the fft_plan. Add free_fft_plan label and route
these error paths through it.

Add missing cleanup for lifter.matrix. Late validation checks
(mel_log_32 space, output capacity) jumped to free_dct_matrix,
skipping the lifter matrix that may have been allocated. Add
free_lifter label for these paths.

Replace rfree() with mod_free() in all error cleanup labels to match
the mod_zalloc() allocations and the existing mfcc_free_buffers()
implementation.

Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
Refactor run_mfcc.sh into functions for input conversion and testbench
execution to reduce code duplication. Add Xtensa testbench support when
XTENSA_PATH environment variable is set, producing xt_ prefixed output
files.

Add decode_all.m Octave script to decode and plot all MFCC cepstral
and Mel spectrogram output files from run_mfcc.sh, including Xtensa
variants.

Update README.txt to document the current run_mfcc.sh output files,
Xtensa support, and decode_all.m usage.

Export XTENSA_PATH in rebuild-testbench.sh so that run_mfcc.sh can
find the Xtensa toolchain path for the testbench build.

Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
The checks previously done in prepare() are done in the
module adapter.

Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
The module copier allows to branch the capture pipeline for different
processing. In this patch series the module-copier is added to be
able to run audio features extraction from the shared headset
microphone endpoint.

Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
Add a new host-gateway-src-mfcc-capture pipeline class that chains SRC
(48 kHz to 16 kHz) with the MFCC component for audio features extraction.
Two new platform configuration files are added:

- sdw-jack-audio-feature.conf: taps the SoundWire jack capture path
  (module-copier 11) into an SRC+MFCC pipeline (pipeline 130, PCM 47)
- sdw-dmic-audio-feature.conf: taps the SoundWire DMIC capture path
  (module-copier 41) into an SRC+MFCC pipeline (pipeline 131, PCM 48)

Both are gated by new IncludeByKey defines
SDW_JACK_AUDIO_FEATURE_CAPTURE and SDW_DMIC_AUDIO_FEATURE_CAPTURE
(default false) in cavs-sdw.conf.

Development topology targets are added for MTL rt713 and ARL
cs42l43+cs35l56 configurations with MFCC features capture enabled.

Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 24 out of 24 changed files in this pull request and generated 7 comments.

Comments suppressed due to low confidence (1)

src/audio/mfcc/Kconfig:13

  • COMP_MFCC now hard-codes 32-bit MFCC processing (MFCC_FFT_BITS=32) and the code calls fft_execute_32(), but this Kconfig only selects MATH_FFT (which defaults to 16-bit support) and does not select MATH_32BIT_FFT. With default Kconfig values this can lead to missing 32-bit FFT objects at link time. Please select MATH_32BIT_FFT here (or make MFCC_FFT_BITS conditional on CONFIG_MATH_32BIT_FFT).
config COMP_MFCC
	tristate "MFCC component"
	depends on COMP_MODULE_ADAPTER
	select CORDIC_FIXED
	select MATH_32BIT_MEL_FILTERBANK
	select MATH_AUDITORY
	select MATH_DCT
	select MATH_DECIBELS
	select MATH_FFT
	select MATH_MATRIX
	select MATH_WINDOW

Comment thread src/audio/mfcc/mfcc.c
Comment on lines 178 to 182
/* get sink data format and period bytes */
sink_format = audio_stream_get_frm_fmt(&sinkb->stream);
sink_period_bytes = audio_stream_period_bytes(&sinkb->stream, dev->frames);
comp_info(dev, "source_format = %d, sink_format = %d",
source_format, sink_format);
if (audio_stream_get_size(&sinkb->stream) < sink_period_bytes) {
comp_err(dev, "sink buffer size %d is insufficient < %d",
audio_stream_get_size(&sinkb->stream), sink_period_bytes);
ret = -ENOMEM;
goto err;
}
comp_info(dev, "source_format = %d, sink_format = %d", source_format, sink_format);

cd->config = comp_get_data_blob(cd->model_handler, &data_size, NULL);
Comment on lines +254 to +258
"true" "platform/intel/sdw-jack-audio-feature.conf"
}

IncludeByKey.SDW_DMIC_AUDIO_FEATURE_CAPTURE {
"true" "platform/intel/sdw-dmic-audio-feature.conf"
SDW_JACK_AUDIO_FEATURE_CAPTURE_PCM_NAME "Jack In Audio Features"
SDW_JACK_AUDIO_FEATURE_CAPTURE_PCM_ID 47
SDW_JACK_AUDIO_FEATURE_CAPTURE_STREAM_NAME "Jack In Audio Features Stream"
SDW_JACK_AUDIO_FEATURE_CAPTURE_PIPELINE_ID 130
SDW_DMIC_AUDIO_FEATURE_CAPTURE_PCM_NAME "Microphone Audio Features"
SDW_DMIC_AUDIO_FEATURE_CAPTURE_PCM_ID 48
SDW_DMIC_AUDIO_FEATURE_CAPTURE_STREAM_NAME "Microphone Audio Features Stream"
SDW_DMIC_AUDIO_FEATURE_CAPTURE_PIPELINE_ID 131
Comment on lines +47 to +59
Object.Base.input_audio_format [
{
in_bit_depth 32
in_valid_bit_depth 32
in_rate 48000
}
]
Object.Base.output_audio_format [
{
out_bit_depth 32
out_valid_bit_depth 32
out_rate 16000
}

/* Convert powerspectrum to Mel band logarithmic spectrum */
mat_init_16b(state->mel_spectra, 1, state->dct.num_in, 7); /* Q8.7 */
/* Convert powerspectrum to Mel band logarithmic spectrum Q9.23 */
Comment on lines +291 to 294
* +-------------------------------------+------------------+
* | 3. power_spectra[], | 6. mel_log_32[], |
* | 32 bits, e.g. x257 -> 1028 bytes | 32b, 92 bytes |
* +-------------------------------------+------------------+
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants